1 Setting time series

library(AER)
library(dplyr)
library(DBI)
library(readxl)
library(knitr)
library(xtable)
library(dbplyr)
library(TTR)
library(fpp2)
library(tsbox)
library(zoo)
library(lubridate)
library(forecast)
library(ggplot2)
library(tseries)
library(dygraphs)
library(stats)
library(vars)
library(gbm)
library(Hmisc)
library(DT)
library(xts)
library(zoo)

Import the datasets

alarms<- read.csv('allarmi_30147127.csv')
pressures<- read.csv("p_30147127.csv")

2 Data Exploration

Let’s start with an exploratory data analysis. It’s always fundamental to have a deep understanding of the datasets.

datatable(head(pressures, 100), options = list(
  columnDefs = list(list(className = 'dt-center')),
  pageLength = 5
))

The pressure’s dataset contains a column named “ts”, which contains a date and a time, and a “value” column, with values of the pressure, round by 2 significant digits.

datatable(head(alarms, 100), options = list(
  columnDefs = list(list(className = 'dt-center')),
  pageLength = 5
))

The alarm dataset, on the other hand, contains a column “date_time”, with a date and a time, and one named “priorita”, with the priority of the alarm occurred.

str(alarms)
## 'data.frame':    171 obs. of  2 variables:
##  $ date_time: chr  "2018-04-10 07:40:42" "2018-04-10 07:42:23" "2018-11-13 09:22:19" "2018-11-14 10:03:21" ...
##  $ priorita : chr  "alta" "bassa" "bassa" "bassa" ...
str(pressures)
## 'data.frame':    20625 obs. of  2 variables:
##  $ ts   : chr  "2018-03-18 00:00:00" "2018-03-18 00:15:00" "2018-03-18 00:30:00" "2018-03-18 00:45:00" ...
##  $ value: num  23.7 23.7 23.7 23.7 23.7 ...
unique(alarms$priorita)
## [1] "alta"  "bassa" "media"

The values of “priorita” are only three. The “date_time” and “ts” columns of the two dataset are in a character format. It’s useful to convert them into POSIXct. Furthermore, the “priorita” is not usable in a character format, so it will be converted into a numeric type.

alarms$date_time = ymd_hms(alarms$date_time)
alarms$priorita = recode(alarms$priorita, "bassa"=1, "media"=2, "alta"=3)
pressures$ts = ymd_hms(pressures$ts)
summary(pressures)
##        ts                          value         
##  Min.   :2018-03-18 00:00:00   Min.   : 0.04514  
##  1st Qu.:2018-07-03 11:00:00   1st Qu.:23.34588  
##  Median :2018-08-17 08:00:00   Median :23.60788  
##  Mean   :2018-09-15 10:54:27   Mean   :23.60724  
##  3rd Qu.:2018-12-25 03:00:00   3rd Qu.:23.86989  
##  Max.   :2019-04-17 23:00:00   Max.   :26.19898
summary(alarms)
##    date_time                      priorita    
##  Min.   :2018-04-10 07:40:42   Min.   :1.000  
##  1st Qu.:2018-12-28 09:18:05   1st Qu.:2.000  
##  Median :2018-12-31 08:53:15   Median :2.000  
##  Mean   :2018-12-28 11:48:51   Mean   :2.205  
##  3rd Qu.:2019-01-08 10:39:05   3rd Qu.:3.000  
##  Max.   :2019-01-08 10:41:04   Max.   :3.000

For the priority’s encoding has been used the numbers 1,2,3 so that the 0 will represent, in the merged dataset, the condition with no alarm occurred, which is obviously present because of the different size of the datasets. It can be useful to plot the pressures dataset…

plot(pressures$ts, pressures$value, type="l")

… and to take a look at the summary statistics

summary(pressures)
##        ts                          value         
##  Min.   :2018-03-18 00:00:00   Min.   : 0.04514  
##  1st Qu.:2018-07-03 11:00:00   1st Qu.:23.34588  
##  Median :2018-08-17 08:00:00   Median :23.60788  
##  Mean   :2018-09-15 10:54:27   Mean   :23.60724  
##  3rd Qu.:2018-12-25 03:00:00   3rd Qu.:23.86989  
##  Max.   :2019-04-17 23:00:00   Max.   :26.19898
summary(alarms)
##    date_time                      priorita    
##  Min.   :2018-04-10 07:40:42   Min.   :1.000  
##  1st Qu.:2018-12-28 09:18:05   1st Qu.:2.000  
##  Median :2018-12-31 08:53:15   Median :2.000  
##  Mean   :2018-12-28 11:48:51   Mean   :2.205  
##  3rd Qu.:2019-01-08 10:39:05   3rd Qu.:3.000  
##  Max.   :2019-01-08 10:41:04   Max.   :3.000

By checking the summary statistics and the plot seems clear that there’s an outlier, a very small value. It’s useful to check whether this value correspond to an alarm

data = pressures[pressures$value==min(pressures$value),1]
data
## [1] "2018-04-10 07:42:04 UTC"
alarms[format(alarms$date_time, "%Y-%m-%d %h")==format(data, "%Y-%m-%d %h"),]
date_time priorita
2018-04-10 07:40:42 3
2018-04-10 07:42:23 1

There is an alarm in correspondence of this value, so it’s possible that this value has been actually verified.

Check duplicates

alarms[duplicated(alarms),]
date_time priorita
pressures[duplicated(pressures),]
ts value

There are no duplicates in the dataset

Is usually useful to check the missing values of the dataset

library(naniar)
alarms %>%
  miss_var_summary()
variable n_miss pct_miss
date_time 0 0
priorita 0 0
pressures %>%
  miss_var_summary()
variable n_miss pct_miss
ts 0 0
value 0 0

In these datasets there are no missing values.

3 Data Preprocessing

It’s necessary to group the alarms’ observations into intervals of 15 minutes. The “priorita” column will be summarized with a “max” function, since in a period of 15 minutes the maximum priority will be taken as the main one.

alarms2 = alarms %>%
  mutate(date_time = floor_date(alarms$date_time, unit="15 mins")) %>%
  group_by(date_time) %>%
  summarise(priorita=max(priorita))

Now it’s possible to merge the two dataset. The merge is an outer one, so that we will keep all the pressure’s values and the alarms’ priority. Then, the resulting NA’s in the “priorita” column will be replaced with 0s, since the absence of an alarm can be indexed with such value.

df = merge(alarms2, pressures, by.x="date_time", by.y="ts", all=T)
df$priorita[is.na(df$priorita)] = 0
summary(df)
##    date_time                      priorita            value         
##  Min.   :2018-03-18 00:00:00   Min.   :0.000000   Min.   : 0.04514  
##  1st Qu.:2018-07-03 12:06:29   1st Qu.:0.000000   1st Qu.:23.34588  
##  Median :2018-08-17 08:13:15   Median :0.000000   Median :23.60788  
##  Mean   :2018-09-15 11:35:17   Mean   :0.001357   Mean   :23.60724  
##  3rd Qu.:2018-12-25 05:45:00   3rd Qu.:0.000000   3rd Qu.:23.86989  
##  Max.   :2019-04-17 23:00:00   Max.   :3.000000   Max.   :26.19898  
##                                                   NA's   :9

There are some values of the pressure that are NA’s. It seems necessary to replace them. One idea can be to compute the mean pressure of “alta”, “media” and “bassa” value of priority, and replace them with this criteria. I’ve tried this but it’s not effective, so a na.locf procedure will be implemented.

#m1 = mean(df$value[(df$priorita==1)&(!is.na(df$value))])
#m2 = mean(df$value[(df$priorita==2)&(!is.na(df$value))])
#m3 = mean(df$value[(df$priorita==3)&(!is.na(df$value))])
#m1
#m2
#m3

#df$value[(df$priorita==1)&(is.na(df$value))] = m1
#df$value[(df$priorita==2)&(is.na(df$value))] = m2
#df$value[(df$priorita==3)&(is.na(df$value))] = m3
df = na.locf(df)

The creation of xts and ts class will be useful for the following operations.

priorita_xts=xts(x=df$priorita, order.by=df$date_time)
value_xts=xts(x=df$value, order.by=df$date_time)
priorita_ts=as.ts(x=df$priorita, start=df$date_time[1])
value_ts=as.ts(x=df$value, start=df$date_time[1])

Regularity and stationarity are two of the main characteristics of the time series.

is.regular(priorita_xts)
## [1] TRUE
is.regular(value_xts)
## [1] TRUE
is.regular(priorita_xts, strict=T)
## [1] FALSE
is.regular(value_xts, strict=T)
## [1] FALSE

The time series is regular but not strictly. In fact, there is a regular underlying structure but the values are not equally spaced.

adf.test(priorita_ts, alternative="stationary", k=0)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  priorita_ts
## Dickey-Fuller = -143.73, Lag order = 0, p-value = 0.01
## alternative hypothesis: stationary
adf.test(value_ts, alternative="stationary", k=0)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  value_ts
## Dickey-Fuller = -26.631, Lag order = 0, p-value = 0.01
## alternative hypothesis: stationary

The Dickey-Fuller reveals they’re both stationary, with a small p-value.

Setting the best frequency is essential for an effective forecasting.

frequency(value_xts)
## [1] 1
frequency(priorita_xts)
## [1] 1
periodicity(value_xts)
## 15 minute periodicity from 2018-03-18 to 2019-04-17 23:00:00
periodicity(priorita_xts)
## 15 minute periodicity from 2018-03-18 to 2019-04-17 23:00:00

The observed data have a 15 minutes periodicity, and there are different possible choices. A frequency of 96 (4*24) means a daily frequency

attr(value_xts, "frequency") = 4
attr(priorita_xts, "frequency") = 4
periodicity(value_xts)
## 15 minute periodicity from 2018-03-18 to 2019-04-17 23:00:00
periodicity(priorita_xts)
## 15 minute periodicity from 2018-03-18 to 2019-04-17 23:00:00

There seems to be no need to filter the dataset

acf(value_xts)

acf(priorita_xts)

pacf(value_xts)

pacf(priorita_xts)

4 Data models ARIMA

The first model that can be used to forecast time series is an ARIMA model. To do so, it’s necessary to consider the univariate time series composed of pressure values.

The first step is to split of the dataset into two subsets: a train set and a validation set

train = value_xts[index(value_xts) <= "2019-04-14"]
test = value_xts[index(value_xts) > "2019-04-14"]

4.1 Training

Searching for the best ARIMA model according to the training set

arima <- auto.arima(train, trace= T)
## 
##  Fitting models using approximations to speed things up...
## 
##  ARIMA(2,1,2)(1,0,1)[4] with drift         : -13407.69
##  ARIMA(0,1,0)           with drift         : -10306.43
##  ARIMA(1,1,0)(1,0,0)[4] with drift         : -11132.52
##  ARIMA(0,1,1)(0,0,1)[4] with drift         : -11066.75
##  ARIMA(0,1,0)                              : -10308.43
##  ARIMA(2,1,2)(0,0,1)[4] with drift         : Inf
##  ARIMA(2,1,2)(1,0,0)[4] with drift         : Inf
##  ARIMA(2,1,2)(2,0,1)[4] with drift         : Inf
##  ARIMA(2,1,2)(1,0,2)[4] with drift         : Inf
##  ARIMA(2,1,2)           with drift         : Inf
##  ARIMA(2,1,2)(0,0,2)[4] with drift         : Inf
##  ARIMA(2,1,2)(2,0,0)[4] with drift         : Inf
##  ARIMA(2,1,2)(2,0,2)[4] with drift         : -13449.21
##  ARIMA(1,1,2)(2,0,2)[4] with drift         : Inf
##  ARIMA(2,1,1)(2,0,2)[4] with drift         : Inf
##  ARIMA(3,1,2)(2,0,2)[4] with drift         : -13705.36
##  ARIMA(3,1,2)(1,0,2)[4] with drift         : Inf
##  ARIMA(3,1,2)(2,0,1)[4] with drift         : Inf
##  ARIMA(3,1,2)(1,0,1)[4] with drift         : -13587.17
##  ARIMA(3,1,1)(2,0,2)[4] with drift         : -13481
##  ARIMA(3,1,3)(2,0,2)[4] with drift         : -13195.95
##  ARIMA(2,1,3)(2,0,2)[4] with drift         : -13101.3
##  ARIMA(3,1,2)(2,0,2)[4]                    : -13707.36
##  ARIMA(3,1,2)(1,0,2)[4]                    : Inf
##  ARIMA(3,1,2)(2,0,1)[4]                    : Inf
##  ARIMA(3,1,2)(1,0,1)[4]                    : -13589.15
##  ARIMA(2,1,2)(2,0,2)[4]                    : Inf
##  ARIMA(3,1,1)(2,0,2)[4]                    : -13482.96
##  ARIMA(3,1,3)(2,0,2)[4]                    : -13197.86
##  ARIMA(2,1,1)(2,0,2)[4]                    : Inf
##  ARIMA(2,1,3)(2,0,2)[4]                    : -13098.58
## 
##  Now re-fitting the best model(s) without approximations...
## 
##  ARIMA(3,1,2)(2,0,2)[4]                    : -13720.79
## 
##  Best model: ARIMA(3,1,2)(2,0,2)[4]

Fitting the model

fit_arima<- xts(arima$fitted, order.by = index(train))

Visually checking the results of the fitting

confronto_arima <- cbind(fit_arima, train) 
dygraph(confronto_arima) %>% dyRangeSelector()

Checking the results based on the measures

accuracy(arima)
##                        ME      RMSE        MAE        MPE      MAPE      MASE
## Training set 9.283695e-05 0.1727744 0.02977672 -0.3902496 0.5797132 0.4910058
##                      ACF1
## Training set -0.007031916
# pseudo R2
cor(fitted(arima),train)^2
##           [,1]
## [1,] 0.9017884

4.2 Validation

Applying the ARIMA model found to the validation set

predizione_arima <- predict(arima, nrow(test), prediction.interval = TRUE)
predizione_arima
## $pred
##          Qtr1     Qtr2     Qtr3     Qtr4
## 5101                   23.97979 23.98836
## 5102 23.99083 23.99679 24.00395 24.00735
## 5103 24.01043 24.00991 24.01005 24.00914
## 5104 24.00810 24.00687 24.00577 24.00493
## 5105 24.00439 24.00417 24.00419 24.00440
## 5106 24.00471 24.00506 24.00537 24.00562
## 5107 24.00579 24.00587 24.00587 24.00582
## 5108 24.00573 24.00563 24.00554 24.00546
## 5109 24.00541 24.00538 24.00538 24.00539
## 5110 24.00542 24.00545 24.00547 24.00550
## 5111 24.00551 24.00552 24.00552 24.00552
## 5112 24.00551 24.00550 24.00550 24.00549
## 5113 24.00548 24.00548 24.00548 24.00548
## 5114 24.00548 24.00549 24.00549 24.00549
## 5115 24.00549 24.00549 24.00549 24.00549
## 5116 24.00549 24.00549 24.00549 24.00549
## 5117 24.00549 24.00549 24.00549 24.00549
## 5118 24.00549 24.00549 24.00549 24.00549
## 5119 24.00549 24.00549 24.00549 24.00549
## 5120 24.00549 24.00549 24.00549 24.00549
## 5121 24.00549 24.00549 24.00549 24.00549
## 5122 24.00549 24.00549 24.00549 24.00549
## 5123 24.00549 24.00549 24.00549 24.00549
## 5124 24.00549 24.00549 24.00549 24.00549
## 5125 24.00549 24.00549 24.00549 24.00549
## 5126 24.00549 24.00549 24.00549 24.00549
## 5127 24.00549 24.00549 24.00549 24.00549
## 5128 24.00549 24.00549 24.00549 24.00549
## 5129 24.00549 24.00549 24.00549 24.00549
## 5130 24.00549 24.00549 24.00549 24.00549
## 5131 24.00549 24.00549 24.00549 24.00549
## 5132 24.00549 24.00549 24.00549 24.00549
## 5133 24.00549 24.00549 24.00549 24.00549
## 5134 24.00549 24.00549 24.00549 24.00549
## 5135 24.00549 24.00549 24.00549 24.00549
## 5136 24.00549 24.00549 24.00549 24.00549
## 5137 24.00549 24.00549 24.00549 24.00549
## 5138 24.00549 24.00549 24.00549 24.00549
## 5139 24.00549 24.00549 24.00549 24.00549
## 5140 24.00549 24.00549 24.00549 24.00549
## 5141 24.00549 24.00549 24.00549 24.00549
## 5142 24.00549 24.00549 24.00549 24.00549
## 5143 24.00549 24.00549 24.00549 24.00549
## 5144 24.00549 24.00549 24.00549 24.00549
## 5145 24.00549 24.00549 24.00549 24.00549
## 5146 24.00549 24.00549 24.00549 24.00549
## 5147 24.00549 24.00549 24.00549 24.00549
## 5148 24.00549 24.00549 24.00549 24.00549
## 5149 24.00549 24.00549 24.00549 24.00549
## 5150 24.00549 24.00549 24.00549 24.00549
## 5151 24.00549 24.00549 24.00549 24.00549
## 5152 24.00549 24.00549 24.00549 24.00549
## 5153 24.00549 24.00549 24.00549 24.00549
## 5154 24.00549 24.00549 24.00549 24.00549
## 5155 24.00549 24.00549 24.00549 24.00549
## 5156 24.00549 24.00549 24.00549 24.00549
## 5157 24.00549 24.00549 24.00549 24.00549
## 5158 24.00549 24.00549 24.00549 24.00549
## 5159 24.00549 24.00549                  
## 
## $se
##           Qtr1      Qtr2      Qtr3      Qtr4
## 5101                     0.1728168 0.2523054
## 5102 0.3176417 0.3596635 0.3979347 0.4183203
## 5103 0.4277992 0.4307304 0.4333349 0.4341933
## 5104 0.4345410 0.4346872 0.4348128 0.4350048
## 5105 0.4353660 0.4359993 0.4369502 0.4382334
## 5106 0.4397800 0.4414842 0.4432295 0.4449220
## 5107 0.4465022 0.4479468 0.4492610 0.4504674
## 5108 0.4515967 0.4526805 0.4537470 0.4548187
## 5109 0.4559109 0.4570317 0.4581829 0.4593608
## 5110 0.4605583 0.4617667 0.4629776 0.4641839
## 5111 0.4653807 0.4665653 0.4677372 0.4688974
## 5112 0.4700481 0.4711916 0.4723305 0.4734667
## 5113 0.4746019 0.4757368 0.4768718 0.4780066
## 5114 0.4791408 0.4802735 0.4814043 0.4825324
## 5115 0.4836575 0.4847793 0.4858978 0.4870129
## 5116 0.4881249 0.4892340 0.4903403 0.4914441
## 5117 0.4925455 0.4936447 0.4947417 0.4958364
## 5118 0.4969289 0.4980191 0.4991070 0.5001926
## 5119 0.5012758 0.5023566 0.5034350 0.5045110
## 5120 0.5055848 0.5066562 0.5077253 0.5087921
## 5121 0.5098568 0.5109192 0.5119794 0.5130375
## 5122 0.5140934 0.5151472 0.5161988 0.5172482
## 5123 0.5182956 0.5193408 0.5203839 0.5214250
## 5124 0.5224639 0.5235008 0.5245357 0.5255685
## 5125 0.5265992 0.5276280 0.5286548 0.5296795
## 5126 0.5307023 0.5317232 0.5327420 0.5337590
## 5127 0.5347740 0.5357870 0.5367982 0.5378075
## 5128 0.5388148 0.5398203 0.5408239 0.5418257
## 5129 0.5428256 0.5438237 0.5448199 0.5458143
## 5130 0.5468069 0.5477978 0.5487868 0.5497740
## 5131 0.5507595 0.5517432 0.5527252 0.5537054
## 5132 0.5546839 0.5556607 0.5566358 0.5576091
## 5133 0.5585808 0.5595508 0.5605190 0.5614857
## 5134 0.5624506 0.5634139 0.5643756 0.5653356
## 5135 0.5662940 0.5672508 0.5682060 0.5691595
## 5136 0.5701115 0.5710619 0.5720107 0.5729579
## 5137 0.5739036 0.5748477 0.5757903 0.5767313
## 5138 0.5776708 0.5786088 0.5795452 0.5804802
## 5139 0.5814136 0.5823456 0.5832760 0.5842050
## 5140 0.5851325 0.5860585 0.5869831 0.5879062
## 5141 0.5888278 0.5897481 0.5906669 0.5915842
## 5142 0.5925002 0.5934147 0.5943278 0.5952395
## 5143 0.5961499 0.5970588 0.5979664 0.5988725
## 5144 0.5997774 0.6006808 0.6015829 0.6024836
## 5145 0.6033830 0.6042811 0.6051778 0.6060732
## 5146 0.6069673 0.6078601 0.6087515 0.6096417
## 5147 0.6105305 0.6114181 0.6123044 0.6131894
## 5148 0.6140731 0.6149555 0.6158367 0.6167166
## 5149 0.6175953 0.6184727 0.6193489 0.6202238
## 5150 0.6210976 0.6219700 0.6228413 0.6237113
## 5151 0.6245802 0.6254478 0.6263142 0.6271794
## 5152 0.6280435 0.6289063 0.6297680 0.6306285
## 5153 0.6314878 0.6323459 0.6332029 0.6340587
## 5154 0.6349134 0.6357669 0.6366193 0.6374705
## 5155 0.6383207 0.6391696 0.6400175 0.6408642
## 5156 0.6417098 0.6425543 0.6433977 0.6442400
## 5157 0.6450811 0.6459212 0.6467602 0.6475981
## 5158 0.6484350 0.6492707 0.6501054 0.6509390
## 5159 0.6517715 0.6526030

By transforming such prediction into an xts object and combining with the actual values, it’s possible to plot the prediction against the actual values

# extract prediction values
predizione_arima_xts <- xts(predizione_arima$pred, order.by = index(test))
# combine original values + training + prediction
modello_arima <- cbind(predizione_arima_xts, test, train)

Plot

dygraph(modello_arima) %>% dyRangeSelector() %>% 
dyOptions(colors = RColorBrewer::brewer.pal(3, "Set1"))

Furthermore it’s useful to check the accuracy measures of the forecast

accuracy(predizione_arima$pred , test)
##                   ME      RMSE        MAE         MPE      MAPE
## Test set -0.01770934 0.0597344 0.04656612 -0.07439064 0.1943499

4.3 Forecasting

end(test)
## [1] "2019-04-17 23:00:00 UTC"

The last date is 17-04-2019 at 23:00:00. The forecasting can be of the next 12 hours.

## Next time index
time_index <- seq(from = as.POSIXct("2019-04-17 23:00:00"), to = as.POSIXct("2019-04-18 11:00:00"), by="15 min")

# find number period
index(time_index)
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
forecast = forecast(arima, length(time_index))
forecast_xts<- xts(forecast$mean, order.by = time_index)

Finally, the plot of the forecasting

predizione<- cbind(forecast_xts, value_xts)


dygraph(predizione) %>% dyRangeSelector() %>% 
dyOptions(colors = RColorBrewer::brewer.pal(3, "Set1"))

5 VAR/VECM

The usual approach is to use Johansen’s method for testing whether or not cointegration exists. If the answer is “yes” then a vector error correction model (VECM), which combines levels and differences, can be estimated instead of a VAR in levels.

train = as.ts(df[df$date_time <= "2019-04-14",c(2,3)])
test = as.ts(df[df$date_time > "2019-04-14",c(2,3)])
train2 = xts(df[df$date_time <= "2019-04-14",], order.by=df$date_time[df$date_time <= "2019-04-14"])
test2 = xts(df[df$date_time > "2019-04-14",], order.by=df$date_time[df$date_time > "2019-04-14"])

5.1 Training

po.test(train)
## 
##  Phillips-Ouliaris Cointegration Test
## 
## data:  train
## Phillips-Ouliaris demeaned = -26178, Truncation lag parameter = 204,
## p-value = 0.01

The Phillips-Ouliaris cointegration test reveals that the series are correlated. It was obvious, since one series is highly correlated to the other one.

df_ts = as.ts(df[,c(2,3)])
var_select<- VARselect(df_ts, lag.max = 10, type = "both")
cointest <- ca.jo(train,K=var_select$selection[1],type = "eigen", ecdet = "const", spec = "transitory")
summary(cointest)
## 
## ###################### 
## # Johansen-Procedure # 
## ###################### 
## 
## Test type: maximal eigenvalue statistic (lambda max) , without linear trend and constant in cointegration 
## 
## Eigenvalues (lambda):
## [1] 8.785177e-02 2.196131e-02 6.152169e-21
## 
## Values of teststatistic and critical values of test:
## 
##             test 10pct  5pct  1pct
## r <= 1 |  452.83  7.52  9.24 12.97
## r = 0  | 1875.10 13.75 15.67 20.20
## 
## Eigenvectors, normalised to first column:
## (These are the cointegration relations)
## 
##              priorita.l1    value.l1    constant
## priorita.l1  1.000000000   1.0000000    1.000000
## value.l1    -0.004417625   0.6210115    3.855939
## constant     0.102896872 -14.6593179 2634.952726
## 
## Weights W:
## (This is the loading matrix)
## 
##            priorita.l1     value.l1     constant
## priorita.d  -0.8985978 -0.002474699 1.180535e-21
## value.d      0.1762669 -0.082125830 3.637673e-20

The Johansen’s procedure highlights that the rank of the matrix is 2, since the hypothesis of rank=0 and rank<=1 are rejected.

cointest <- ca.jo(train,K=var_select$selection[1],type = "eigen", ecdet = "const", spec = "transitory")
vecm <- cajorls(cointest, r=1)


#Transform VECM to VAR
var <- vec2var(cointest)

5.2 Validation

# Predict
forecast_dfb <- predict(var, n.ahead=232) # VECM
#forecast_LT2b$fcst$xts_LT2[,1]  # predictes values

# xts and ts class
predizione_dfb <- xts(forecast_dfb$fcst$value[,1] , order.by = index(test2))
predizione_dfbts <- ts(predizione_dfb)

# combine training and predicted values
pred_dfb <- cbind(predizione_dfb, test2$value, train2$value)
accuracy(predizione_dfbts, as.numeric(test2$value))
##                   ME       RMSE        MAE        MPE      MAPE
## Test set 0.008751104 0.05710784 0.04692652 0.03592482 0.1956478
dygraph(pred_dfb) %>% dyRangeSelector() %>% 
dyOptions(colors = RColorBrewer::brewer.pal(3, "Set1"))

5.3 Forecast

time_index <- seq(from = as.POSIXct("2019-04-17 23:00:00"), to = as.POSIXct("2019-04-18 11:00:00"), by="15 min")
index(time_index)
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
forecast <- predict(var,n.ahead=length(time_index))

predizione_p_xts <- xts(forecast$fcst$value[,1] , order.by = time_index)

df_xts = xts(df, order.by=df$date_time)

predizione <- cbind(predizione_p_xts,df_xts$value)

dygraph(predizione) %>% dyRangeSelector() %>% 
dyOptions(colors = RColorBrewer::brewer.pal(3, "Set1"))

Accuracy measures of ARIMA(3,1,2)(2,0,2)[4]: ME RMSE MAE MPE MAPE Test set -0.01770934 0.0597344 0.04656612 -0.07439064 0.1943499

Accuracy measures of VECM: ME RMSE MAE MPE MAPE Test set 0.008751104 0.05710784 0.04692652 0.03592482 0.1956478

So, to summarize, the two models are quite similar in performances.

df[df$priorita==1,3]
## [1] 23.69148 23.69148
df[df$priorita==2,3]
## [1] 23.70736
df[df$priorita==3,3]
## [1] 23.49000 24.01327 24.34019 24.22577 26.16302 23.80077 24.02588 24.20522

With the exception of one single value, the thresholds seems to be:

“bassa” -> 23.6 “media” -> 23.7 “alta” -> 23.8

colnames(predizione_p_xts) = "Pressione"
predizione_p_xts$Stato = ifelse(predizione_p_xts$Pressione<23.6, "normale", ifelse(predizione_p_xts$Pressione<23.7, "bassa", ifelse(predizione_p_xts$Pressione<23.8, "media", "alta")))
predizione_p_xts
##                     Pressione Stato
## 2019-04-17 23:00:00  23.98973    NA
## 2019-04-17 23:15:00  23.99059    NA
## 2019-04-17 23:30:00  23.99202    NA
## 2019-04-17 23:45:00  23.99023    NA
## 2019-04-18 00:00:00  23.98902    NA
## 2019-04-18 00:15:00  23.98549    NA
## 2019-04-18 00:30:00  23.97818    NA
## 2019-04-18 00:45:00  23.97554    NA
## 2019-04-18 01:00:00  23.97306    NA
## 2019-04-18 01:15:00  23.97337    NA
## 2019-04-18 01:30:00  23.97365    NA
## 2019-04-18 01:45:00  23.97558    NA
## 2019-04-18 02:00:00  23.97840    NA
## 2019-04-18 02:15:00  23.97987    NA
## 2019-04-18 02:30:00  23.98095    NA
## 2019-04-18 02:45:00  23.98099    NA
## 2019-04-18 03:00:00  23.98085    NA
## 2019-04-18 03:15:00  23.97991    NA
## 2019-04-18 03:30:00  23.97877    NA
## 2019-04-18 03:45:00  23.97804    NA
## 2019-04-18 04:00:00  23.97756    NA
## 2019-04-18 04:15:00  23.97749    NA
## 2019-04-18 04:30:00  23.97758    NA
## 2019-04-18 04:45:00  23.97801    NA
## 2019-04-18 05:00:00  23.97849    NA
## 2019-04-18 05:15:00  23.97885    NA
## 2019-04-18 05:30:00  23.97907    NA
## 2019-04-18 05:45:00  23.97912    NA
## 2019-04-18 06:00:00  23.97907    NA
## 2019-04-18 06:15:00  23.97888    NA
## 2019-04-18 06:30:00  23.97867    NA
## 2019-04-18 06:45:00  23.97850    NA
## 2019-04-18 07:00:00  23.97840    NA
## 2019-04-18 07:15:00  23.97836    NA
## 2019-04-18 07:30:00  23.97839    NA
## 2019-04-18 07:45:00  23.97847    NA
## 2019-04-18 08:00:00  23.97857    NA
## 2019-04-18 08:15:00  23.97865    NA
## 2019-04-18 08:30:00  23.97870    NA
## 2019-04-18 08:45:00  23.97871    NA
## 2019-04-18 09:00:00  23.97870    NA
## 2019-04-18 09:15:00  23.97866    NA
## 2019-04-18 09:30:00  23.97862    NA
## 2019-04-18 09:45:00  23.97859    NA
## 2019-04-18 10:00:00  23.97856    NA
## 2019-04-18 10:15:00  23.97855    NA
## 2019-04-18 10:30:00  23.97856    NA
## 2019-04-18 10:45:00  23.97857    NA
## 2019-04-18 11:00:00  23.97859    NA